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Abstract. In the deterministic context Bakushinskii's theorem excludes the existence 
of purely data driven convergent regularization for ill-posed problems. We will prove 
in the present work that in the statistical setting we can either construct a counter 
example or develop an equivalent formulation depending on the considered class of 
probability distributions. Hence, Bakushinskii's theorem does not generalize to the 
statistical context, although this has often been assumed in the past. To arrive at this 
conclusion, we will deduce from the classic theory new concepts for a general study 
of statistical inverse problems and perform a systematic clarification of the key ideas 
of statistical regularization. 



1 Introduction 

We consider statistical inverse problems, where an unknown signal x should be reconstructed from 
indirect noisy measurements v no ise = Tx + noise. The problem is assumed to be ill-posed, i.e. the 
operator T is not continuously invertible such that we can only approximate the signal. In classic 
inverse problems the noise is supposed to be deterministic and bounded. Nevertheless it is well- 
known that various applications cannot be modeled appropriately in this way. Therefore, stochastic 
models have been introduced, where the noise is taken as random variable or stochastical process 
. In some studies, e.g. Jiol \vk liitl. not only the noise but also the operator or the 



signal are stochastic. 

In both the deterministic and the stochastic setting one crucial point is the knowledge of the noise 
level which is often not available in application. However, the Bakushinskii veto [1] states for 
classic inverse problems the equivalence of the ill-posedness of the problem and the nonexistence 
of purely data driven reconstruction methods, for which the approximated solution tends to the 
exact signal x when the noise vanishes. This theorem is of particular importance since it constitutes 
the need of supplemental information, as for instance the noise level. 

For statistical inverse problems the situation is ambiguous as we will discuss in the paper at hand. 
To study the existence of such reconstruction methods we need explicit definitions of the involved 
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objects. While an extensive theory for classic inverse problems has been developed 111 U 1151. 132H. 
only selected aspects of statistical inverse problems have been analyzed so far. Additional diffi- 
culties, arising from the possible unboundedness of stochastic noise, are the need of new error 
and convergence criteria |E, 12, 18, 27]. Cavalier explained in [9] how concepts of nonparametric 
statistics, e.g. the white noise model, risk estimation and model selection, can be applied to inverse 
problems. We will proceed in reverse by studying how the key ideas of the classic inversion theory 
have to be modified for beeing suitable for a statistical setting. 

First of all we give a brief recapitulation of the classic regularization theory, in which we suggest in 
particular a reduction of the usually required convergence properties. Our statistical setting is intro- 
duced in section [3j] being followed by the presentation of the main concepts and central definition 
in section I3T21 There we propose to link the noise to the asymptotic of the noise level, which will 
turn out to be the deciding idea for definition l3.20l of convergent statistical regularization methods 
and our main result stated in section [4} We prove an equivalent formulation and give a counterex- 
ample to Bakushinskii's theorem depending on the considered class of probability distributions. 



2 Classic inverse problems 

We consider the usual setting of classic inverse problems. Let Hi and H2 denote separable Hilbert 
spaces with scalar products (., .)h, and the induced norms ||.||e,> i= 1>2. Further let T : Hi — >• H2 
be a linear, compact and bounded operator with a nonclosed range $t (T). We are interested in the 
problem 

y b = Tx + ^, (2.1) 

where x G Hi denotes the unknown signal, 8 > is the noise level and the normalized noise £, £ H2 
satisfies ||^||h 2 < 1- With ker(r) 1 - as orthogonal complement of the kernel of T we can define the 
generalized inverse T + as the linear extension of the inverse of T\ ker ^±. A motivation and some 
properties of the generalized inverse can be found e.g. in [1 1]. Since the range of T is assumed to 
be nonclosed, T + is discontinuous and x + := T + y € Hi has to be regularized. 

In the following subsection we will not present the common definition of (convergent) regulariza- 
tion methods given in [1 1], but the definitions introduced by Hofmann and Mathe in 11911 . Research 
has shown that purely data driven regularization methods can yield remarkably good results, see 
for instance OLEl]], although these methods are not convergent as the Bakushinskii veto proves. 
This teaches us to distinguish convergent and arbitrary regularization schemes as is done in the 
following approach. 



2. 1 Linear and convergent regularization schemes 



Notation 2.1 (Singular value decomposition (SVD) of T II IIP . Let {(■*;*; Vy, My) j.^ denote the 
singular system of the operator T, where |jy| . N is arranged in decreasing order with lims,- = 0. 
The following series expansion holds: 

Tx = E s j(*> v /) Hi u i 

7>1 
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Definition 2.2 (Linear regularization IU9I1 ). A family F := {F a } a> Q of linear and bounded opera- 



tors F a 



0,\\T\ 



is called regularization (filter) if the following properties hold: 



1. The associated bias family {^a} a >o> where := 1 — t>F a (t>), converges pointwise to 



zero: lim b a {s]) = for all sj > 0. 



a->0 



< Yo- 



2. The bias family is uniformly bounded by some yo > 0, i.e. sup sup 

ot<a sj>0 

3. There is a constant y* > such that the parameter family can be normalized for all a G (0,°°) 
and sj > by sj F a (s 2 j) < y*/y/a. 

In this case, the family R := {/?a}«>o °f linear and bounded operators R a : H2 — > Hi with 



R a y ■= x a ■= F a (T*T)T*y = £ F a (sj)sj (y, w/} Hi vj,y G H 2 , 

, 7 >o 



(2.2) 



is called linear regularization scheme (in short: regularization), where the last equation follows 
from the functional calculus described in II ill . 

Notation 2.3. Below, we will use without further comments the notations 

F ■= i F o} a >0 and R : = { R a}a>0 ■ 

Example 2.4. The given definition is satisfied by many of the known linear regularization in terms 
of jlin such as spectral cut-off, which is defined by 

F a (&) :=i9- _1 X( a| || r ||2-)(Ty) such that x a = R a y = £ sj 1 (y,Uj)vj, 



where % denotes the indicator function, 0C,i9- G ^0, [|T'[| , and Tikhonov regularization with 
F a (-&) := l/(a + *), such that x a = /? a j = (a/ + T*T)~ l T*y. 

Remark 2.5. 

1. Later on we will require a stricter bound instead of property (3) of definition 

sup |F„(0)| <£,Y>0. 

0<-&<||r|| 2 



(2.3) 



It is easy to show, that the given examples satisfy this property too. In IB2I1 it is shown, that 
(3) follows if (2) and Q hold. 
2. As generalization we could also require that the index family of F is an arbitrary subset of 
the real numbers with at least one accumulation point, say h G R. Then property (1) has 
to be reformulated in the following way: lim b a (s A = for all Sj > 0. We cannot skip it 

completely because it yields the following important proposition. 



Proposition 2.6 (Pointwise convergence to T + [23]). Let R denote a linear regularization and 
T>{T + ) the domain of the generalized inverse T + ofT. 

1. Ify G © (T + ), then sup ||x a || Hl < 00 and x a = R a y — > T + y when a — > 0. 

2. Ify <£ V (T + ), then lim ||x a || H] = °°. 
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In particular we get for all y £ H2 that lim TR a y = TT + y = Qy, where Q : H2 — > (T) denotes 

a^O ~ 

the orthogonal projection onto ^ (T). 



Remark 2.7. A similar result can be found in IllL proposition 3.6]. 

Convergence in general and especially convergence rates are established quality criteria for the 
comparison of regularization schemes. Normally, one claims that the regularized solution x a should 
converge uniformly to the exact one, if the error tends to zero: 

Definition 2.8 (Parameter choice $2$\ \ Let R denote a linear regularization scheme and a : (0, °°) x 

M 2 -> (0,oo) a function. If for all y G © (T + ) it holds 

lim (sup{a(8,y 8 ) :y s GH 2 ,||>'-j s || IHI , < 8}) =0, 
0— >o 

then a is called (classic) parameter choice. In particular we will say: 

1. a is purely data driven or heuristic if it depends only on the data, i.e. a = a,(y$). 

2. a is (classic) convergent w.r.t. R if for all y G 2> {T + ) it holds 

^(supjll^-^s^ysHHj :y8eH2,||y-y 5 ||H 2 < 5 }) =°- 

The pair (R, a) of a linear regularization R and a parameter choice a is called (classic) convergent 
regularization method of T + if a is convergent w.r.t. R. 

Notation 2.9. 

• Here, we applied the usual error criterion for classic inverse problems: 

e(R,a,x,5) := sup j \\T + y - Ra^ysWm, : ^8 e H 2 , ||y -y 8 || H2 < 8} , where y = Tx. 

• Many parameter choice strategies depend on the applied regularization scheme R which is 
why we should write a(/?,8,ys). However, we will use for simplicity a(8,ys) instead. 

Example 2. 10. The discrepancy principle [11,26] is a good example of a parameter choice which is 
very common for classic inverse problems but cannot be applied in the statistical setting as we will 
explain in remark [378] It chooses the regularization parameter for a given regularization scheme 7? 
and a fixed constant x > 1 by setting 

a* :=sup{a< \\A*A\\ ,\\ARay s -y s \\ <x8}. 

Therein and in most of the established convergent methods the knowledge of the noise level 8 is 
needed. In contrast, the quasi-solution of Ivanov [20] yields convergent regularization assuming 
instead of that an upper bound for the norm of the solution x a . Well-known purely data driven 
parameter choices are the L-curve criterion of Hansen 01611 . the generalized cross-validation of 
Wahba J33I1 and quasi-optimality lEoll . 

Theorem 2.11 (Bakushinskii veto A purely data driven (classic) convergent regularization 

method exists if and only if the generalized inverse T + is continuous. 

Proof sketch. With a purely data driven (classic) convergent regularization method (R,a) we get 
necessarily for exact data that T + y = R a ( y )y for all y 6 D(T + ) such that for arbitrary sequences 
{y n } pN C D(T + ) with limy„ = y it holds limT + y„ = \imR a t y \y n = T + y, which yields the well- 
posedness of the problem. □ 
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2.2 Reduction of the requirements 

In the statistical setting we cannot require uniform convergence as we do in the deterministic con- 
text since the noise may be unbounded. The resulting question is, if for classic inverse problems the 
convergence criterion could also be diminished. We want to ensure that the approximated solution 
of the problem converges to the exact one if the noise tends to zero. But for that purpose we do 
not need to include the supremum as is done in definition 12.81 It is only a technical simplification. 
Additionally we want to skip the requirement that the function a has to converge to zero if the 
noise vanishes. In fact, it is unimportant how a behaves as long as (12.41) is satisfied. 

Definition 2.12 (Generally convergent regularization). The pair (R,a) of a linear regularization 
R and a function a : (0,°°) x H2 — > (0,°°) is called (generally) convergent regularization of T + if 
the regularized solution converges in the following sense to the exact one: For all {y^} k>l with 
y(*) :=y + 8(*)EW,8W>0, IIE^IL < 1 and HmSW = we have 




(2.4) 



Remark 2.13. In order to achieve an easier notation, one could be tempted to claim only pointwise 
convergence. But this would mean to fix the noise and vary only the noise level, which forms a 
considerable and unrealistic restriction. 

Conclusion 2.14 (The Bakushinskii veto for general methods). As the supremum is not necessary 
for the proof of theorem [2JJ] an equivalent formulation can be varified analoguosly for generally 
convergent regularization. 

3 Statistical inverse problems 

In this section we provide new concepts for a general study of statistical inverse problems. As main 
idea we link the noise to the asymptotic of the noise level varying its probability distribution. 

3. 1 Statistical setting 

In recent publications about statistical inverse problems one can find two models of stochastical 
noise, random variables iQ/zLQ"!] and Hilbert-space processes As every Hilbert-space valued 

random variable with finite second moment can be identified with a Hilbert-space process, we will 
concentrate mostly on the latter. 

Definition 3.1. A Hilbert-space process is a linear and continuous operator 

S : H 2 -> L 2 {Q.,f ,P), v ^ Ev =: (E,v) H2 , 

where (O, f ,P) denotes a probability space, S, r the Borel-a-algebra generated by the topological 
space 1 and 

L 2 (£2, 7 ,P) := {Z : (H, <f ,P) ->■ (R, « H ) square-integrable random variable} . 
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Definition 3.2. The covariance Cov^ : H 2 — > M2 of a Hilbert-space process E is implicitly defined 
by 

(Cov s yi,y 2 ) H2 =Cov((S,yi} H2 ,(S,y 2 } H2 ) »yi>y2€H 2 . 
Hence it is a bounded and linear operator. 

Example 3.3. A centered Hilbert-space process E with the unit matrix as covarianceis called white 
noise process. In this case E is Gaussian if the associated random variables are Gaussian, i.e. if 
(S, v) H ~ 5\£ (0, ||v|| H ) . Inverse problems with Gaussian white noise have been studied e.g. in 



/Hi ~ J ^ V ' 11 ™2 



Assumption 1. We assume E : H 2 — >• L 2 {Q.,f ,P) to be a centered Hilbert-space process with 
E [(S,v} H J = for all v <G H 2 and ||Cov s || < °°. 

Notation 3.4 (Observation model). Let E be as in assumptionQ] We consider the following abstract 
observation model: 

F s =y + 8S, where y G <D(T + ) and 8 > 0. (3.1) 

Conclusion 3.5. The realizations of E and thus of F§ do not have to be in H 2 because E is only a 
weak random element of H 2 . As a consequence several basic concepts have to be revised: 

Notation 3.6. We want to generalize the notation P~ of image measures from random variables 
to Hilbert-space processes. Let S be a Hilbert-space process. Then we interpret P~ as the proba- 
bility measure which is well-defined by its finite-dimensional marginal distributions on the space 
(R H2 , ( l B m ) m2 ), where R Mi denotes the space of all functions / : H 2 ->■ R and ( l S R ) m2 denomi- 
nates the associated a-algebra. The existence and uniqueness of P~ is ensured by the Kolmogorov 
extension theorem [281. 



Definition 3.7 (Noise level). The definitions of the noise level of classic and statistical inverse 
problems differ significantly. The noise level 8 of an inverse problem is defined as scale factor of 
the noise £, or accordingly E, such that 



• E 



< 1 and therefore ||y — ys 



e 2 

INIh 2 



< 1 and therefore E 



E 2 



< 8 for all 8 > if yg as in (|2~TT) . 



\\y - Y s \\l < 8 2 for all 8 > if Y s £ L 2 (£1,H 2 ) 



11 1/2 

• ||Cove|| < 1 if F5 as in notation 

Remark 3.8 (Discussion). From a statistical point of view, only the third case is of interest. For 
instance, the discrepancy principle desribed in example ^. lOl cannot be applied to observations with 
white noise since the term ||A/? a 7 5 (co) — 75(00) (j^ could be infinite. For observations with noise 
modelled as random variables it yields convergent methods by contrast. So, the second case is very 
close to the deterministic setting as we will support by proposition 13.261 

For the deterministic context we defined the regularization operators between the observed Hilbert- 
spaces. The following notation allows us to apply them also to Hilbert-space processes: 

Notation 3.9. We observe a Hilbert-space process E : H 2 — > L 2 (Q., J ,P) and a linear and bounded 
operator R : H 2 — >■ Hi. Then, we will interpret the composition RE as a Hilbert-space process on 
El!, i.e. as/?S : Mi ^L 2 (H,^,P) with vH-/?Sv=: {RS,v) B = (E,R*v 



Remark 3.10. RE is well-defined, since (E,R*v) m = E(R*v) £ L 2 {Q.,f ,¥). The linearity of R 
yields further that R 7 S = R y + 8R E . 



H 2 - 

— T7l'E>*,,\ r- l1(C\ rr 

H 2 
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As parameter choices do not have to be linear, we cannot interprete the term a(S,F§) in a similar 
way. That is why we will use, where necessary, the sequence space model, which was discussed 
for instance in |4i la UM '■ 

Notation 3.11 (Sequence space model). Let {(sj',Vj,Uj)j denote the singular system of the opera- 
tor T . The sequence space model is defined by 

F s (co) := {F 8 ,y(co)} sv>0 with F 8j (co) = (y.Hy),^ +S(S,w y ) H2 (co), (oeO. (3.2) 

In application only finite data are available why we introduce additionally the following observation 
model, which is more realistic and has been studied for example in |25|] : 

Notation 3.12 (Discretized data). Let us consider the one-sided discretization of F§: 



QF 5 = QAx + 8(22 = £ Y 8j Wj with Fg^co) = (y,Wj) U2 + 5 <E, Wj ) mi (co), co € ft, (3.3) 

7=1 

where Q denotes the projection onto the linear span of an orthonormal system {w\, ...,w n }. 
Remark 3.13. 

• We assume to have observations without repetitions. 

• (13.31 ) conforms to the well-known regression model with orthonormal design. 

• It is evident that this model leads to a supplemental error term, the discretization error, which 
changes the convergence rates but not the underlying convergence behaviour if we require 

that n = «(8) with limw(8) = °°. 

8->0 

To compare and qualify different methods we need an error criterion. Most authors use the mean 
squared error (MSE) and so will we. It is defined as follows: 

Notation 3.14 (Error criterion). Let S satisfy assumption [TJ We set 



MSE(R,a,x,8) := (E 



T + y-R a Y s 



1/2 



, where y = Tx. 



Proposition 3.15 (Finiteness of the mean squared error). Let R a , a > 0, denote a regularization 
operator with associated regularization filter F a satisfying < \2.3i . If the operator T is Hilbert- 
Schmidt, the MSE ofR a is finite for all xGHi and 8 > 0. 

Proof. By Parseval's identity and Fubini's theorem we get for all y G l D{T + ) the so called bias- 
variance decomposition of the mean squared error: 



E 



T + y-R a Y 5 



\T+y-R a y\t+&E 



I p W ||2 



(3.4) 



The first term is the squared bias, which is related to the approximation error and specifies the 
difference between the exact solution and the expectation value of its estimate. It is finite for all 
y E © (T + ) and vanishes if a — > as we have shown in proposition 12.61 The variance measures the 
variability of the estimate caused by the noise. Applying the singular system {(s j;vj,Uj)} j eN of T 
with sj < \\T\\ we get 



E 



\p "II 2 

l A aHllHI, 



£|F a (4> ; .| 2 E[^]<£|F a (4> ; .| 2 <J||r| 

sj>0 sj>0 



2 

HS • 



(3.5) 



since from llCov^ II < 1 it follows that E 



7?2 



< 1 for all coordinates Ej := (S,H ; -)e 2 , j > 1. □ 
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Assumption 2. In the following we assume the operator T to be Hilbert-Schmidt and any consid- 
ered regularization filter to satisfy (12.3I ). 

Remark 3.16. We stress that the bound in ( 13.51 ) does not yield optimal order. 



3.2 Regularization of statistical inverse problems 



To define convergent statistical regularization methods we need a reasonable handling of the stochas- 
tical noise when studying the asymptotic of a regularization method for 8 — > 0. As crucial point we 
recognize that not only the realization of the observations could vary for changing noise levels but 
even the underlying probability distribution could alter. 

Remark 3.17 (Main idea: Linking the noise to the asymptotic of the noise level). For a cho- 
sen class of probability distributions W we consider the asymptotic behaviour of a regularization 
method (R,a) when the index k > 1 tends to infinity, i.e. we study 



lim 



where := Y^(y) :=y + 8«S« with S« 



for y G D{T 



>0and lim8« 

k— >°° 



0. 



Example 3.18. 

• Let F" be any probability distribution and W := {F~}, i.e. we set SW := S for all k > 1. 
The assumed distribution can be interpreted as a priori knowledge of the noise behaviour. 
The most popular example of this approach are observations with Gaussian white noise. 

• By setting W := {F~ : E ~ F~ centered Hilbert-space process with ||Covs < °°} we approve 
arbitrary observations F" -=y + S^SW where S^' can be any Hilbert-space process satis- 
fying assumption [T] Here the change to the stochastic context causes a loss of information. 

• As a compromise we could consider any subclass of 'Wq such as the Dirac measures or the 
centered normal distributions with bounded covariance. 

Remark 3.19 (Kinds of convergence). In order to formulate the aspired definitions we still lack 
in a convenient kind of convergence. In consideration of definition 13.71 there are basically three 
possibilities available: convergence in mean square, convergence in probability and convergence 
in distribution. The latter is too weak to yield usefull results but convergence in probability should 
suffice for a lot of cases. Nevertheless the convergence in mean square is often prefered because of 
its technical advantages. One should decide as the case arises. 

Definition 3.20 (Convergent statistical regularization). Let 7? be a linear regularization scheme, 
a : (0,°°) x R N — > (0,°°) a measurable function and %> a class of probability distributions. We set 



M w (y) := {Fg = y + SS : 8 >0, F s G <W and ||Cov s || < 1} for any y <G © (T + ). 



(3.6) 



The pair (R,a) is called convergent statistical regularization w.r.t. W if for all y G (D(T + ) and 
arbitrary observations {Fs}g >0 C (y) the regularized solution converges F-stochastically to the 
exact one when 8 — > 0: 



For all 



C 



limF 



1 n' 



(y) with :=y+ 8 W E W and lim 8 W = we have 

k— ¥oo 



T + y 



' /? a(8W,yW( (D )) }/W ( (0 ) 



> £ 



for all £ > 0. 
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Remark 3.21. The convergence in probability could be replaced by the convergence in mean square. 
We call such schemes convergent statistical regularization in mean square w.r.t. W . 

Example 3.22. 



Random variables: Hofinger and Pikkarainen study in 11171 11811 convergence rates of the 
Tikhonov regularization using the Ky-Fan metric as error criterion and allowing only obser- 
vations whose noise can be modeled as random variable. 

Statistical parameter choices: In addition to modifications of classic parameter choices, sev- 
eral strategies have been developed especially for the stochastic context. One of them was 
introduced by Lepskii in 112211 and since then adapted to various models as for example statis- 
tical inverse problems with Gaussian white noise JsLB]- Another common parameter choice 
is cross-validation. In Tsybakov H3 lh it is presented in a regression model and in 13311 one 
can find a 8-free version. 

Gaussian white noise in the abstract model (13.11 ): In ^ the convergence in mean square of 
a Lepskii-type parameter choice applied to spectral cut-off is proven for observations with 
white noise. 



Gaussian white noise in the regression modell (13.31) : Mathe and Pereverzev have shown in 



1 250 that Lepskii's procedure converges also with Tikhonov regularization. Our analysis in 
section [4] will be based on this study. That is why we want to outline briefly the crucial 
results. In [25] the authors focused on discretized data with random noise as described in 
notation f3. 121 They assumed that: 

b) x+ e Ty(R) := {x e Hi : x = q>(T*T)v, ||v|| < R}, where cp : (0, ||r|| 2 ] -> R + is an 
increasing and operator monotone function with cp(O) = 0. 

c) The singular values of T satisfy sj x j~ r for all j > 1 and some r > 0. 

d) There is a constant C > 1 such that \\{I-Q)T : Hi -^M 2 || < Crank(0- r . 
Further, they set 

1) R a :=(aI + B*B) l B* with B :=QT 

2) oco := S 2 and a.j := ao<7 ; , where q > 1 and j = i,...,m := |~21og g (||r|| 2 /8)] 

3) x Jfi :=R aj Qy& = (ajI+B*B)- l B*QY 8 ((0) 

4) n = n(a) x [oc -1 / 2 '"] and Q = Q n , where a > and Q the described orthonormal pro- 
jection onto span({wi, ...,w n }) 

5) Let Cy, C\ and C 2 > be such that 

V=M|2 



■■= C*^^-rank(2) > E [||/? aj !2S|| 2 ] (decreasing) and 
<£(j) :=Ci(p(C 2 a 7 ) > \\T + y-R aj QTx\\ (increasing) with j = 0,...,m 

satisfy 8*P(a ) > ^(cxo). 
Now, the regularization parameter is chosen according to 

a* := OLj t with 7* := max {j = 1, ...,m : \\x k 5 — jc / 5 1 1 < 4k8 ^(k) for all k < j} , (3.7) 

where K := yjrn. The idea of this choice is to approximate the parameter a opt which satisfies 
5*F(a pt) = <£(a op t). Finally, we get with ®{t) := t^ 2r+ ^l Ar ^{t),0 < t < \\T\\ 2 , and 5 > 
sufficiently small, that 



sup (E[||x-^. s || 2 ]) 1/2 <C allA /r21og (/ (^)lcp(0- 1 (|)), 5<5o, 
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what converges to zero when 8—^0. 

Remark 3.23. For more details about the concepts of general source conditions and operator mono- 
tone functions we refer to 0,124, 25] and the references therein. 



3.3 Relation between classic and statistical regularization methods 



As justification for section [3721 and as preparation of section [4] we are interested in the connection 
of regularization methods of the two settings. In general, we have to modify at least the parameter 
choice a because of the changed domain of definition. In order to formulate sufficient criteria for 
the stochastical convergence of (R,ai) we need to control the decay of a(8,F§). The following 
notation will help us to describe it conveniently. 

Notation 3.24 (Stochastic Landau-Symbol ). Let {Z n ) ne ^ be a sequence of random variables on a 
probability space (0, 7 ,P) and (c n ) ne ^ a sequence of real- valued constants. We denote 



: op(c„) 



liml 

n— >oo 



> £ 



for all £ > 0. 



Proposition 3.25. Let (R,CL) be any generally convergent regularization, 

W C {P~ : S ~ P" centered Hilbert-space process with\\Covs\\ < °°} =: Wq (3.8) 

and (y), y £ l D(T + ), such as in < \3.6i . The modified method (R,&) constitutes a convergent 
statistical regularization w.r.t. 'W for any measurable function a : (0,°°) x M N — >■ (0,°°) if for 
arbitrary observations {7 W } k>l with F w := y + S W S (<:) 6 (y) it holds 



UmF(a(5 k ,Y k )>e)=0foralle>0 and (a(5 k ,Y k )y l = o ¥ (5 k L ). 



(3.9) 



Proof Lety £ <D{T + ), {Y [k) } k>l Q M w (y) withF^ :=y + g(*)sW and limSW =0. Proposition 

— k— >°° 

I3.15l yields with assumption |2] for any number a > the finiteness of the mean squared error: 



E 



T+y-R a Y 



(k) 



<\\T + y-R a y\\i+(^)%\\T\\ 2 us <oo, k>L 



Now, we consider a measurable function 6c : (0, °°) x M N — > (0, °°) satisfying (13.91 ) and insert in place 
of the number a the function value a(8^,Y^((0)), where yW(eo) = {Y^ ((to)}j>i for go e £2 and 
k > 1 . In doing so we allow for a moment that the parameter choice and the regularization operator 
are applied to different realizations of Y^ k \ k > 1. We get from proposition 12 . 6 1 that 



lim ' 



(0£fl:E 



T+ y- R a(mj( k )(a)) Y{k) 



> £ 



o 



(3.10) 



for any £ > since the sum of two stochastical convergent sequences converges stochastically. So, 
we can say: For all £ > there exists a subset Cl E C Q. with P(£2 E ) > 1 — £, such that 



limE 



T+ y- R a(m,¥W((a)) Y 



for allot) £l) E with P(ffl) > 0. 
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Further we can deduce that for all e,r\ > and ob G £1 E with P(fi)) > 



lim ! 



Finally, we achieve 



lim ' 



r + y-«a(8W,y«( ffl ))? W N 



>T1 



>tl }) <e 



for all £,r| > 0. Since £ is independent of T|, we can conclude stochastical convergence. 



□ 



Proposition 3.26. Any generally convergent regularization (R, a), where (X is measurable, satisfies 
definition \3. 20\ of convergent statistical regularization w.r.t. 



Wi C W 2 := {P s : S G L 2 (£2,H 2 ) with E [||E||jL] < 1}. 



(3.11) 



The converse holds if 'Wi contains the Dirac measures. 



Proof. Let (R,a) be a generally convergent regularization method with measurable a, y G £> (7" + ) 
and |y« , with Y (*) := y + S^E (*), P sW G Wi, 8« > for all jk > 1 and limS® = 0. We fix 

£ > 0, set C := £ -1 / 2 and define for any CO G £2 the set 



a: (co) 



fc> 1 : 



E«(co) 



<C CN and the number 



fc(oo) := argminjA; > 1 : r + y-/? a(S( t) yW(a))) F w (co) ^ < £ V/ G % (co) with Z > 
Then it follows from Chebychev's inequality and the convergence of (R,a) that 
CO G H 



> £ 



r + ^-^a(5W,F«(a,))^ W ((0) 

<P({a>e£2:it£ 3c(oo)})+P({ooGa:£<£(oo)}) < 2e 
for k > 1 sufficiently large and finally 

r+ >' _ ' R a(8W,y«(a))) J ' ( ^( o:) 



lim P < go G £2 : 



> £ 



for all £ > 0. 



□ 



Proposition 3.27. Any purely data driven convergent statistical regularization (R,a) w.r.t. Wq, 
induces a purely data driven generally convergent regularization (R, 6c). 

Proof sketch. Let us contemplate deterministic observations of the form y( k > := y + S'^^W g U2 

with v G © (T+), ||E(*) IL < 1, 8W > for all Jk > 1 and lim 8^) = 0. We define for any k > 1 the 

v m |IH 2 — - k _^, J — 

following Hilbert-space valued random variable 



7«(C0) 



if CO G Hi 
y«, if00G£2 2 , 



Regularization of statistical inverse problems and the Bakushinskii veto 



12 



where P(£li) = P(Q 2 ) = 0.5. Every random variable Y^ k \ k > 1, can be identified with a centered 
Hilbert-space process, such that the function 

a:M 2 ^(0,oo), 3 ;«^a({^ ) , .} Y 

fit) (/t) 

where y^ := 7 g(lt j (go) for any go G £2i and j > 1, constitutes with the regularization /? a purely 
data driven generally convergent regularization. □ 

Remark 3.28. The proposition holds also for methods w.r.t. a subclass W C <j^ if W allows 
for arbitrary deterministic observations {y^} k>l of the above form the definition of a sequence 
{Y®}^ C (y) with P ({(0 G n : F t (eo) = y*}) > r| for r| >0. 



4 The Bakushinskii veto for statistical inverse problems 

The following study was motivated by the paper "Regularization independent of the noise level: an 
analysis of quasi-optimality" by Bauer and ReiB [4], which raised the question of the transferability 
of the Bakushinskii veto to statistical inverse problems. 

Theorem 4.1. 

1. A purely data driven convergent statistical regularization method w.r.t. Wq, see < \3.8i . exists 
if and only if the range %.(T) of the operator T is closed. 

2. For certain probability distributions P" there exist purely data driven convergent statistical 
regularization w.r.t. W : = {P~} even if the problem is ill-posed. 

Remark 4.2 (Generalization). The first statement remains valid for sufficiently large subclasses of 
Wo such as %>2 of (13.111 ) or the class of all Dirac measures. We refer to remark [3728] 

For the proof of the second statement we need some preperation: 

Notation 4.3 (Setting). In order to construct an example supporting theorem 14.11 (2) let us fo- 
cus on an operator T : L 2 ([0, 1]) — > L 2 ([0, 1]) and data with Gaussian white noise modeled by 
Y§(t) = Tx(t) +8S f ,f G [0,1], which is consistent with (I3.ll ). We consider the equidistant de- 
composition Z„ := (0 = to < t\ < ... < t„ = 1) with tj := j/n for j = 0, and the orthornormal 
system {9y}y=i ) ... ) », where cp y - := (y/tj— t]-i) X[f._, t ■)• ^ projecting F5 onto the linear span of 
{9;};=l,...,n we get a finite set of coefficients 

Yij := {Ys,(f>j) L 2 {[m = (y/tj-tj^f 1 f J Tx(s)ds + 5£j, £ ; -~ ^(0,1) 

Jtj-\ 

with j = 1, ...,n, such that 

n rtj 

Q Y s(t) = L y 8,;<P;W = (tj-tj-l / Tx(s)ds + y/n8Ej for t G \tj-\,tj). 

Remark 4.4 (Outline). This setting conforms to the regression model with orthonormal design and 
without repetitions as discribed in notation 13.121 In example 13.221 we mentioned that Tikhonov 
regularization forms with a Lepskii-type parameter choice a convergent statistical regularization 



method (R,c£) w.r.t. 2\£ (0,1) 12511 . Plugging in an estimation of the noise level into this method we 



can deduce a purely data driven one as we will verify now. 
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For that purpose we want to use the following estimator: 



Definition 4.5 (The estimator 11411 ). 

S « : = 2^2 iiQMtj) ~ QY^tj-i)) 2 (4-1) 

Before studying its asymptotical behaviour we remind of the following notation: 

Notation 4.6. (0]) Let / denote an interval. A function / : / — > R is called Holder continuous with 
exponent s £ (0, 1] if for all t G / a neighborhood U C R exists, such that 

mn 1/(0-/(01 . 

SU P i, ,,,, < °°- 

Assumption 3. Let j = J* G £ 2 ([0, 1]) be Holder-continuous of order s G (|, 111 . 
Example 4.7. Assumption [3] is satisfied for any integral operator T : L 2 ([0, 1]) — >• £ 2 ([0, 1]), 

(Tx)(t) = / k(t,u)x(u)du, 
Jo 

with kernel & : [0, l] 2 — > R satisfying for some constant C > 

sup \k(t,u)-k(t',u)\ < C\t -t'\ s , t,t' G [0, 1] . 

ae[0,l] 



Conclusion 4.8. Assumption |3]implies in pursuance of II29L pages 212-213] that 

(Qy(tj-i) - Qy(tj)f x 0( n - 2s ), j=i,...,n, 

what from we can deduce the asymptotical unbiasedness of 8 2 when n — > oo; 

E(8 2 ) = 5 2 + 0{n- {l+2s 1) and E(8 2 ) > 8 2 
Remark 4.9. In proposition 14. 1 3 1 we need s > I. 

Proposition 4.10 (Concentration inequality). Le£ « = rc(S) X [8 _ri ] w/?/i 2 > rj > 8 := x8„ 

and 

Q, + := Q. + (5,1,K) := jco G H : 8(co) G [8,^x8]| , X,K > 1 appropriate. (4.2) 

The following assertions hold for all 8 < So with 8o > sufficiently small: 

1. There are constants C i, C2 > such that P(Q\£2 + ) < Ci exp (— C2« 2 (a,8) 8 2 ) . 

2. /f holds for a > awcf some C3 > ma? 



sup f ||r+3;-/? (X eFg|| 4 JP<C38 4 a"V(a,8). 



T+yeT^R)- 

We want to use the following Lemma for the proof of proposition I4.10t 
Lemma 4.11. 
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1. Let X be a Gaussian random vector in a Banach space % and ||X|| p := E [||X||gl , < 
p < °°, the LP -norm of X. For all < p,q < °° there is a constant K Pt q > such that 

\\X\\p < ^p.tyll^ll?- 

2. LetX G L 4 (£l, J ,P) he nonnegative. It holds 

poo 

X 4 ((o) d¥((o) =4 t 3 ¥(X> t)dt. 
a Jo 

3. \\(aI + B*B)- l B*B\\ < 1 

Proof of lemma \4JT\ The first statement can be found in [21, page 60] and the second one follows 
by a generalized version of partial integration, which is given in B13L chapter 5, § 6]. By spectral 



calculus we get the inequality in (3). 

Proof of proposition \4.10\ 

1. Using Lemma l4. 111 (1) we get with 8 n = \\X n \\, where 

X n := ^ (QY s (h) - QY s (t ),...,QY & (t n ) - QF 8 (*»-i)) ~ H. (EX n ,/„) 



□ 



that 



P(Q\Q+) =p({coGfl:x8„(a)) < s}) +P ({© e Q. : tS„(<b) > x/<:8} 



«2ll|8n||2 -8» > ^2jll 8 »ll2 " i + P 8„ -^l,2||5„|| 2 > ^5 - ^ lj2 ||5„ || 2 



< 



E 



5j-5 re |>^||M 2 -f) + Pr|5„-E[5j|>^-||8J 2 



since the Cauchy-Schwarz inequality yields K\ 2 = 1. At this point, we would like to apply 
the concentration inequality (3.2) in [21, page 57] what for we have to ensure that x||8„|| 2 > 
A" 2 .i8 and Kh > ||S n || 2 . The first requirement is satisfied for all x > K 2 ^i as ||8„|| 2 > 8. For 
the second we need that K S> 1 since we have for a constant c > 

||8„|| 2 <8 + cn- {l+2s ^ 2 x 5 + c8^ +2s)/2 < (c+ 1)8 for 8 G (0, 1). 

Supposing that x and K are appropriate it follows that for some constants C\ ,C 2 > 

p(o\n + ) 



< 2exp \-^ n 2 (a,5) \K 2 \ ||8„|| 2 - f 
<Ciexp(-C 2 « 2 (a,8)8 2 ) . 



+ 2exp ( -Jr?i 2 (a,8) #8-||S„|| 2 
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2. After lemma l4~TT1 (2) and (3) and the concentration inequality (3.5.) in j2l[ page 59] it holds 
for some constant C3 > that 

JjT + y-R a QY 5 \\ 4 dF = 4j o t 3 F (\\T + y - R a QY 8 \\ > t) dt 

<4 t 3 dt + 16 t 3 F(2\\T + y\\+§\\R a QZ\\>t)dt 

JO J2\\T+y\\ m] 

< l6\\T+y\f Ml + 16 f ? 3 exp f- t^^l ) dt 

poo 

= 16||r + j||^ + 16 y {t 3 + 6t 2 \\T + y\\ Ml + nt\\T + y\^ x +8||2 T+ y||| 1 ) e~ l ' E dt 

= 16||r+3;||^ + l£ 2 + ^||r + 3;|| Hl £ 3 / 2 + 6||r + y||^£ + 4VS||r + y||^V£ 
<16||r+3;||4 i+ C35 4 a-V(a,5), 

where < Sj < \\T\\ yields 



E := 8E [||o/? a £S|| 2 ] = 85 2 £ |^| 2 E [\(QZ, Uj ) m2 \ 2 } < 88 2 ||r|| 2 cr 2 «(a,8). 

sj>0 J 

The assertion follows for all 8 < 80 with 80 > sufficiently small. 



□ 



Remark 4.12 (Asymptotic behaviour of n). In example 13.221 we set n X [a 1 ' 2r ], r > 0, but in 
proposition I4.10l it was more advantageous to link n to 8. Combining the two approaches we get 

n:=n(a,8) = max{«i (a), 712(8)}, where n\ (a) x |~a~ 1 / 2r ] and 722(8) x [S -11 ]. (4.3) 

Due to the fact that we take another asymptotic behaviour of n as basis of our analysis than stated 
in example [3.221 we have to revise the convergence result. 

Proposition 4.13. Let a* := CHj t (8,F§) denote the regularization parameter according to Lepskii's 
principle as described in example \3.22\ If we assume that n = n(<X, 8) as in (14. 3\) with r\ < 2 ( instead 
ofn = n(a) X |~0C~ 1 / 2 '"] as before) then 

lim( sup E[\\T + y-R at QYs\\ 2 ] ) = 0. 

\T+y6r,(*) / 

Proof. Mathe and Pereverzev have shown in J25I Theorem 5] that under the assumptions and 
notations of example [3 .221 it holds for some Co > that 



sup E[\\T + y-R at QY 5 \\ 2 ] <C J\2log g (\\T\\ 2 /5)]<f>(&), 

T+yeT v (R) V 

where oc := aj with j := max{j < m : < 8*l / (j)}. The proof of this bound does not depent 
on the asymptotic behaviour of n aside from the requirement of the existence of a constant D > 
satisfying *P(j) < D^ViJ + 1) for all j = 0, ...,ra — 1. Since this is fulfilled even for our new choice 
of n we cite the given inequality without further proof. The only modification which we made is a 
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slight change of the definition of j, which simplifies the notation. Now, we want to prove that the 
right hand side converges to zero. We follow the ideas in [25] and set 

8 (O := max{[f 1/4s 'l , [S^ 2 ] }y/iy(t), t > 0, and a* := inf{a > : 8 (a) > 8} . 

©§ is increasing in t such that for every 5 > there is a unique choice for a*. We notice that 



8^(oc*) = 5CW^ = f 5 (max{\(a*) l/4s ],\& /2 ]}V¥Y < &cp(oc 



This leads to a < a* because of the definition of <J> and the monotonicity of <I> and *P. Finally, we 
can deduce 



lim sup E[\\T + y-R at QY & \\ 2 } < limC al u/ [21og (||r||2/8)l cp(a*(8)) = (4.4) 

\T+yeT v {R) J V 

since lima* (8) = if r\ < 2. □ 

8-s-O 



Remark AAA. The convergence rate given in (14.41 ) has not to be optimal. 
Finally, we achieve: 

Proof of theorem \4J\ Any purely data driven convergent statistical regularization method (R,a) 
w.r.t. induces the existence of a purely data driven convergent regularization (R, a) in terms of 
definition 12. 121 as shown in proposition [3221 If so, the range % (T) of T is closed, see lemma l2~T4l 
So, we turn to the second statement: 

We consider the setting desribed in notation 1431 with Tx satisfying assumption [3 the estimator 8 2 
given in definition 14.51 and the set £2 + introduced in ( I4.2I ). Let R a , m, {aj}j = o m , {xj $}j=o,..., m > 
r<p(/?), *P and K be as in example [3.221 and n := n(cc,8) as in (14.3I ). First of all we want to verify if 
the assumptions of example [3 .22 1 are satisfied. The first one follows by definition and the second if 
x + £ Ty(R). The definition of the projection Q and the Holder continuity of Tx yield by [29, pages 
212-213] asumption (d) since 

11(7-2)7:11! ->H 2 || < Crank(<2)-' s , 



where s G (1/2, 1] denominates the Holder exponent of Tx. Assumption (c) has been used in 02511 
as basis of assumption (d) and in order to prove the order optimality of the convergence result, why 
we can ignore it. As a consequence we set r := s in n\ (a) x [a -1 / 2 ' ] such that 

m (a) < m (oc ) x 8~ 1/s < 8^ x n 2 (8) if r| > 1 /s. 

Now, we want to examine 

E[\\T + y-R &t QY s \\ 2 ] = f \\T + y - R &t QY 8 \\ 2 dF + f \\T + y - R &t QY 6 \\ 2 dF, 

where a* := OLj t (8,Fg) with 8 := x8„ denotes the regularization parameter resulting from Lepskii's 
principle (13.71 ) when using the estimated noise level. It is quite evident that 

/-> r- /-> r\ rs\ f r- /-> S||R a e„S(co)|| H , 

Q := ^k(o) := < 00 € Q. : max wr^ - < K 

[ y=l....,m(8) TsW 



Regularization of statistical inverse problems and the Bakushinskii veto 



17 



if the constant CV > in *P is sufficiently large. As oc ; ;(8,Yg) and a* lead on Q. + to the same 
asymptotic behaviour of R a Q¥h we can deduce from proposition 14.131 that the first term on the 
right vanishes when 8 — > if r\ < 2. Furthermore, the Holder-inequality yields that 

f \\T+y-R &t QY4 2 dF< ( [ \\T + y - R &t QY 8 \\ 4 dp) ' (P(Q\H + )) 1/2 . 

Hence, it follows from proposition 14.101 that for all 8 < So with 8o > sufficiently small it holds 
with r\ := l/s > j-p^, where s G Q, l] , that 

sup f \\T + y-R &t QY s \\ 2 dF<C a n\5-^]cxp(-^C2b 2 -^), 

T+yET< s ,(R)->n\&+ 

and finally 

limf sup E\\\T + y-R &i QY 8 \\ 2 } ) = 0, 



8^0 



whicch completes the proof. □ 

Remark 4.15 (Numerical procedure). The numerical procedure including the estimation of the 
noise level can be described with the notations of example [3.221 as follows: 

Choose: x>^ 2 ,i; P > 1; q > 1; n G N; m G N; £ > 0; 8o:=0; k:=0; 
Do: k:=k+l; 

h := \™ 2 Y1) =X (y S (j/n) -y & ((j-l)/n)) 2 ; 

a:=8 2 ; 

n := max{n(a,8k) , p * n}; 

While: ((k < m)e + (k > m)max{|8^ — 8j\,j = k — m,...,k} > eS^J ; 

Adapt: K := ^/m; n = n(a,8 k ); B := Q n T; x x := (al + B*B)- i B*y & ; k:=0; 
Do: k:=k + l; 

a:=q*a; 

n := n(a,5 k ); 

B := Q n T; 

x k :=(aI + B*B)- l B*y 8 ; 
While: (\\xj -x k \\ < 4K.8y/y(aqj- k ),j < k and a < ||r|| 2 ^ ; 
Return: x k - \ ; 

The second part is a modified version of the strategy presented in [25]. 



5 Conclusion 

In this paper we have developed new concepts for the study of statistical inverse problems. The 
central idea was to link the noise to the asymptotic of the noise level 8—^0, varying its probability 
distribution, which is assumed to be an element of a fixed class W w.r.t. which the convergence of 
the considered regularization is required. By means of this approach we were able to disprove the 
often supposed general transferability of the Bakushinskii veto to the stochastical context. 
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A lot of continuative issues arise out of this result: The estimation of the noise level gained in 
importance. In particular estimation methods which utilize just one data set are of special interest 
as the estimate can be incorporated into a regularization method. How does the various parame- 
ter choices react to the usage of an estimated noise level and how can we compensate unwanted 
behaviors? For which other classes of probability distributions does an analog statement to the 
Bakushinskii veto hold and for which ones can we derive counter examples? 
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